Machine and Human Performance for Single and Multidocument Summarization

نویسنده

  • Judith D. Schlesinger
چکیده

coherency—and be able to draw the “best” information from a set of documents. Automatic single-document text summarization1 has been an active research area since the 1950s, with a renaissance of approaches since the 1990s. Human single-document summarization is well defined when guidelines and recommendations drive performance.2,3 System-generated single-document summaries, while not always matching well with reference summaries, are generally good quality and have proved useful. In contrast to the single-document task, multiple-document summarization development (automated or not) lacks complementary documentation of procedures and methodologies for human performance. Although researchers have explored various strategies for analyzing documents in a collection and then synthesizing and condensing information to produce multidocument summaries, they have not yet seen strong system performance. The lack of guidelines has a greater impact on multidocument summarization than on the single-document task. Analyzing and synthesizing such extensive information tax both human and machineprocessing capabilities. The recent NIST-sponsored (National Institute of Standards and Technology) Document Understanding Conference II (DUC 2002)4 evaluated automatic multidocument summarization capability using 59 single-document collections with an average of 10 documents per set. These sets covered single events, multiple related events, and biographies. NIST provided 30 document sets, including single documents, single-document human-generated reference summaries, and multidocument human-generated reference summaries for each set. Additional multidocument summaries for each set, which different human summarizers wrote, were also available, as was data from the DUC 2001 conference. For evaluation, human evaluators compared system-generated summaries to the reference human summaries. Our prototype multidocument summarizer operates by first generating single-document summaries and then selecting from those sentences to produce the multidocument summary. It is based on our current text summarization system, which delivers, in real time, indicative summaries for a high-volume, heterogeneous document collection to US government users. Thus, the constraints of our environment tend to prohibit the use of more knowledge-intense approaches but truly represent typical requirements for commercial viability. We believe our environment highlights actual practical performance demands and NLP challenges.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query-Based Summarization: A survey

This paper presents a survey of recent extractive query-based summarization techniques. We explore approaches for single document and multidocument summarization. Knowledge-based and machine learning methods for choosing the most relevant sentences from documents with respect to a given query are considered. Further, we expose tailored summarization techniques for particular domains like medica...

متن کامل

Text Summarization based on Itemized Sentences and Similar Parts Detection between Documents

In this paper, we propose a text summarization system for a single document and multiple documents. The system for a single one extracts sentences from a document and itemizes them to generate a summary. We applied this mechanism for Task A (single document summarization). We also utilized this mechanism for multi-document summarization (Task B) except for itemization mechanism. The system for ...

متن کامل

Identifying Multidocument Relations

The digital world generates an incredible accumulation of information. This results in redundant, complementary, and contradictory information, which may be produced by several sources. Applications as multidocument summarization and question answering are committed to handling this information and require the identification of relations among the various texts in order to accomplish their task...

متن کامل

Towards a Unified Approach to Simultaneous Single-Document and Multi-Document Summarizations

Single-document summarization and multidocument summarization are very closely related tasks and they have been widely investigated independently. This paper examines the mutual influences between the two tasks and proposes a novel unified approach to simultaneous single-document and multidocument summarizations. The mutual influences between the two tasks are incorporated into a graph model an...

متن کامل

ارائه یک سیستم هوشمند و معناگرا برای ارزیابی سیستم های خلاصه ساز متون

Nowadays summarizers and machine translators have attracted much attention to themselves, and many activities on making such tools have been done around the world. For Farsi like the other languages there have been efforts in this field. So evaluating such tools has a great importance. Human evaluations of machine summarization are extensive but expensive. Human evaluations can take months to f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001